Exact distributions of the number of distinct and common sites visited by 

independent random walkers 
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We study the number of distinct sites SN{t) and common sites W]v(t) visited by independent 
one dimensional random walkers, all starting at the origin, after t time steps. We show that these two 
random variables can be mapped onto extreme value quantities associated to A'^ independent ran- 
dom walkers. Using this mapping, we compute exactly their probability distributions P^{S,t) and 
P^{W, t) for any value of A'^ in the limit of large time t, where the random walkers can be described 
by Brownian motions. In the large limit one finds that Sjv(f)/\/^ oc 2v'loglV + S'/(2-^log A^) and 
W^jv(t)/\/t oc w/N where 's and w are random variables whose probability density functions (pdfs) 
are computed exactly and are found to be non trivial. We verify our results through direct numerical 
simulations. 
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In elementary set theory, two fundamental concepts 
are the union and the intersection of a number of N sets. 
While the union consists of all distinct elements of the 
collection of sets, the intersection consists of common ele- 
ments of all the sets. These two notions appear naturally 
in everyday life: for example the area of common knowl- 
edge or the whole range of different interests amongst 
the members of a society would define respectively its 
stability and activity. In an habitat of N animals, the 
union of the territories covered by different animals sets 
the geographical range of the habitat, while the inter- 
section refers to the common area (e. g. a water body) 
frequented by all animals. 

In statistical physics, these two objects are modeled 
respectively by the number of distinct and common sites 
visited by N random walkers (RWs). The knowledge 
about the number of distinct sites has applications rang- 
ing from the annealing of defects in crystals [l|, Q and 
relaxation processes [3|-|6| to the spread of populations 
in ecology [3, Q or to the dynamics of web annotation 
systems [9|. Similarly the knowledge about the common 
area frequented by endangered animals is very useful for 
their daily health caring. Likewise, in the energy trans- 
port through a series of independent disordered samples, 
the energy output will depend on the number of energy 
levels common to all these materials. 

Dvoretzky and Erdos first studied the average 
number of distinct sites {Si{t)) visited by a single t- 
step RW in d-dimensions, subsequently studied in [lll - 
13|. Larralde et al. generalized this to N independent, 
t-step walkers moving on a d-dimensional lattice [3]. 
They found three regimes of growth (early, intermedi- 
ate and late) for the average number of distinct sites 
{S]y{t)) as a function of time. These three regimes are 
separated by two TV-dependent times scales IJ]. In par- 
ticular they showed that in d = 1 and t y/log N, 
(S'jv(O) -\/4Z) t \ogN where D is the diffusion constant 
of a single walker. Recently Majumdar and Tamm [l5l| 



studied the complementary quantity, namely the number 
of common sites WN{t) visited by N walkers, each of t 
steps, and found analytically a rich asymptotic late time 
growth of the average (VFAr(t)). They showed that in the 
{N — d) plane there are three distinct phases separated 
by two critical lines d = 2 and dc{N) = 2N/{N - 1), 
with (Wjv(i)) ~ l^-te times where the growth ex- 

ponent u = d/2 (for d < 2), u = N - d{N - l)/2jfor 
2 < d < dc{N)] and = [for d > ddN)] (see also [ig). 
In particular, in d = 1, (VFAr(t)) ^ ^/ADt where the pref- 
actor depends on N. However, most of these studies 
were limited to the average number of distinct or com- 
mon sites, and there exists virtually no information about 
their full probability distributions, e.g. the probabilities 
P^{S,t) that SN{t) = S and P^(M^,0 that WN(t) = W. 

Computing these distributions for general d- 
dimensional space is highly non trivial. Indeed, 
although the TV walkers are independent, conditioning 
their trajectories to a given number of distinct (or com- 
mon) visited sites introduces strong eff'ective correlations 
between them. In d = 1, we show here that these random 
variables S^it) and W]\[(t) can be mapped onto extreme 
values (nearest and furthest displacements) associated 
to N independent walkers. This connection to extreme 
value statistics (EVS) allows us to compute P^{S,t) and 
P^{W,t) exactly for t large and arbitrary TV. We show 
that the induced correlations between the walkers persist 
even for N ^ oo where the limiting distributions are 
not given by EVS of independent random variablea,^ as 
erroneously argued in the previous study of SN{t) [14,]. 

We consider TV independent and identical i-step RWs 
xiiT),X2{T), ■ ■ ■ ,xn{t) on a 1-d lattice, all starting at 
the origin. For convenience, we set the diffusion constant 
of the walkers D — ^. Distinct sites are those that are 
visited at least once by at least one of the TV walkers [3l , 
while common sites correspond to sites visited individu- 
ally at least once by all the TV walkers [13]. We denote 
by Mi and rui respectively the maximum and the min- 
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FIG. 1. (Color Online) Schematic diagram of 2 independent 
RWs, where M+, M_, m+, m- and 52, W2 are shown ([1] 



imum displacements of the i walker a;jup to time t. 
The mimber of distinct sites visited, Sn [17|, is then the 
sum of the range on the positive (+ve) side, M_|_, and the 
range on the negative (-ve) side m_ (see Fig. [ij : 

Sm = AI+ + m_ , M+ — max Mi , m_ ~ — niin m,- . 

(1) 

Similarly, the number of common sites visited, Wn, is 
the common span on the +ve axis plus the common span 
TO+ on the -ve axis: 



Wn = M- 



min Mi 

l<i<N 



max rUi 

l<i<N 



(2) 

Eqs. (1) and (2) establish a precise connection between 
Sn and Wn and the EVS of N independent RWs. 

In the limit of large t, the lattice RWs converge to 
Brownian motions (BMs). Hence for large t, the proba- 
bility distributions P^{S,t) and P^{W,t) take the scal- 
ing form 



(3) 

where pj^is) is the probability density function (pdf) of 
the span or range, s = S/\/2i, and p%f{w) is the pdf of 
the common span or common range, w — W/^/2i^ for N 
independent BMs (see Fig. [T|) on the unit time interval 
[li] . The rescaled quantities Sn / V2t and Wn/^/^ in ^ 
are given by ^ and ([2]) where M± , rn± are replaced by 
their counterparts M± = M±/\/2i and fh± — m±l\pli 
corresponding to N independent BMs on the unit time 
interval. 

It is useful to summarize our main results. We ob- 
tain exactly, for any N , the pdfs Pn{s) and p^(u') as 
presented in and along with ^ and d!]). The 
moments can also be computed explicitly [19||. The tails 



of the pdfs can be derived explicitly: 

CATS^s [-Ar7r2/(4s2)] , s -> , 



bN exp (— s^/2) , s — > 00 , 



and 



PnH 



cn w , w 

dNW^-^ 





exp (—Nw^ 



(4) 



(5) 



where aN,bN,CN and (In are computable constants (see 
below). For iV — 00, one finds that both pdfs approach 
a non trivial limiting form 

p^(s)^2v/b^I?(2ybi¥(s-2Vl^)) , 
Vil) = 2 e-^Koi2 , (6) 

where Kn{x) denote the modified Bessel functions, and 



Pn{w) = N C (Nw) , C(w) = - we' 



, w > . (7) 



Note that 2?(s) ([6|) is not the Gumbel distribution, as it 
was initially argued in jlj]. Remarkably the same dis- 
tribution I'(s) also appears as the limiting distribution 
of the maximum of a large collection of log arithmically 
correlated random variables on a circle [20|. We check 
indeed f^^V{s')ds' = 2e-^/^Ki{2e-^^'^), as obtained 
in 20]. Incidentally, logarithmically correlated random 
variables have been the subject of several recent studies 
because they exhibit freezing phenomena, akin to 
the replica symmetry breaking scenario found in mean 
field spin glass models . As a byproduct of our com- 
putation, we show that 'D{s) is the convolution of two 
independent Gumbel distributions. 

We start by computing the joint cumu- 
lative distribution functions (jcdf) Prf(/i,Z2) 
— Pr. ^Af+ < Zi,m_ < l2^, relevant for p%{s) and 

the jcdf Pc(ji,i2) = Pr- > ji,m+ > ^2) rele- 
vant for p'^{w). Since all the N BMs are identical 
and independent, Pd{h,h) = 3^ (^1,^2), where 
g{lij2) = Pr.(M < li,m > -h) is the jcdf of the 
maximum AI and the minimum rh for a single BM 
on the unit time interval. It can be computed by the 
standard method of images [23 |: 
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2 °° 1 
(^1,^2) = - V — -T-sh 



n—O ^ 



(2n+l)7r^2 



il+!2 



Similarly, Pc(ji,i2) = ^^{h^h) where ft.(ii,j2) 
Pr. \M > ji,fh < — j2 I reads: 



(8) 



hiji..j2) = 1 - erf (ji) - erf (j2) +5(ji,j2), (9) 

where erf(a;) = (2/^^) e-y^dy, erf(ji) = Pr.(M < 
ji) and erf(j2) = Prob(TO > — j2)- From the joint pdf 
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^^^hdh^'' ^^^"^g ©' obtain 

p%{.s) = dh dh S{s ^h- l2)g^^ , (10) 

with g = g{li, h)- Similarly, from the joint pdf ^ aj/ajf ^'' 
and using ^ we obtain, 



roo Q2^N 
djl dj2 5{w-ji -j2) ^.r,. , (11) 

Jo OJ1OJ2 



with h = h{ji,j2). For small values of N, the double in- 
tegrals in (fTOj) and (fTTj) can be performed explicitly and 
numerical simulations confirm these exact results [l9| . 
Below we provide a physical interpretation of these for- 
mulas ([TUl [TT|) and perform, separately, their asymptotic 
analysis both for small and large arguments. We also 
analyze their limiting form for N 00. 




FIG. 2. ( Color online) Plot of p^(s)/(2Vlog TV) as a function 
of s = 2Vlog iV (s - 2Vlog N) . The dotted line indicates the 
exact asymptotic results for A*' — > 00, 2?(s) in (|6]). Inset: 
Plot of Pio(s), obtained from simulation, compared with its 
asymptotic behavior Q. 

Distinct sites : To find the tails of p%{s) at small and 
large s for finite N, we rewrite (fTOl) as 



J2) where 



(12) 



'^d{hj2)^N g 



-1 d'g 



dg dg 



^_^+iV(7V-l)g^-2_ 
0I10I2 oh ah 



We interpret the two contributions in 4'(i(^i,^2) as fol- 
lows the first term corresponds to a configuration 
where one particle explores a region [—I2, s — I2] (we call 
it a box) of size s in unit time interval, such that its 
maximum is at s — I2 and minimum is at — Z2, while all 
the other {N — 1) particles stay inside this box. On the 
other hand, the second term corresponds to a configura- 
tion where two particles create, in a different way, the 
same box [— ^2, s — Z2] of size s: one of the two particles 
has its maximum at s — /2 and minimum larger than —I2 



while the second particle has its minimum at —I2 and 
maximum below s — I2 and all other {N — 2) particles 
stay strictly inside this box. 

When s in (fT2|) . one can replace g{li,l2) © 
by its asymptotic behavior when ^1,^2 where 
I 



(HH), we 
equally, 
tains the 
flAT =47r3/2Ar(Ar. 
Gamma function 



-sin f 7^ 



— ^ . ^ i{ii+i2)'-' . Inserting it in 
+12 J 

see that both terms in contribute 
After integration over I2, one then ob- 
result announced in (|H) for s — >• with 



1) (I) 



N-2 



r(f) 



where T{x) is the 



To perform the large s asymptotic of 
p%{s) we use the Poisson summation formula: g{li, I2) — 
E™=o(-l)™ [erf [m(li + ^2) + ^i] + erf [m(/i + h) + ^2]]- 
We use this form to evaluate the integrand in ([T^ in 
the limit s — > 00. We see that the first term in (|12p . 
which corresponds to create a box [— ^2, s — h] with one 
particle, decreases as e"*^""*"'^' e"'^ whereas the second 
term where the same box is created by two particles 
decreases as e"^*"'^^ e"'^. Since I2 is always -l-ve, the 
two particles term wins over the one particle term when 
s 00: this is physically understandable because creat- 
ing a very large span with two particles is more likely 
than creating the same one with a single particle. It also 
follows from this analysis that the integral over I2 in (fT^ 
is dominated by I2 ~ C>{s), which yields finally the large 
s behavior announced in ^ with = 2N{N — l)/y/Tr. 
In Fig. [2] we verify that the small and large s asymptotics 
of p%{s) given in for iV ~ 10, describe very well, 
without any fitting parameter, the distribution obtained 
from direct simulation, without any fitting parameter. 

What happens for large N ? The typical scale of the 
fluctuations of Sj\i/\/2i can be estimated from the re- 
lations with EVS (P). The variables Mi's, with i — 
1, • • • ,N, which are the maxima of the i**^ BM on the unit 
interval, are i.i.d. variables. Their common pdf is known 
to be a half-Gaussian, p(Af) = {2/^)e-^'\]\I > 0. The 
same holds for the variables —mi's. Hence, for large N, 
standard results of EVS [25] state that the typical value of 
Af+ — maxi<i<jv Afi is 0{^/\ogN) while its fluctuations 
are of order 1 / ^/^ogN and governed by a Gumbel distri- 
bution. The same also holds for m_ — — mini<.i<jv m^. 
For large N, these two extremes become uncorrelated as 
the global maximum and global minimum are most likely 
reached by two independent walkers. Hence one gets 
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(13) 



with pn — y/\ogN. Inserting in pll|) with s 
2/^7v(s — 2/xjv) one flnds 



pUs) - 2v/biiV 



dl 



2 e 'e~' 



-i=-l2) 



(14) 



which can be evaluated explicitly to give (O- In Fig. [5] we 

plot p%{s) /2y/logN against s for TV = 50 and 100. They 
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show a relatively good agreement with the exact result 
2?(s) after an overall shift of order 0{l/\ogN) along the 
X-axis, thus revealing, as expected, a slow convergence 
towards the asymptotic result. In flj] the authors ar- 
gued that the limiting distribution should be a Gumbel 
distribution, overlooking the fact that it is actually the 
convolution of two Gumbel distributions, as in (jl4p . In 
particular, for large s, 'D{s) ~ se"'*, while the Gumbel 
distribution decays as a pure exponential. 




FIG. 3. (Color online) Plot of p%f{w)/N as a function of = 
Nw. The dotted line indicates the exact asymptotic results 
for N ^ oo, C{w) in (jT]). Inset: Plot oipf(w), obtained from 
simulation, compared with its asymptotic behavior ([5]). 

Common sites : To find the small and large w asymp- 
totics of P^{w) we write (|TT]) as 



/ c(?2 *c (w - j2 , j2 ) where 

^0 



N- 



(15) 
dh dh 



OJ1OJ2 OJi dj2 



In ()15p . one interprets the first term as one single particle 
creating a common span [— ^'2,^^ ~ ^2] of size w and the 
second term as two particles collaboratively creating the 
same common span (in a unit time interval) [l^. In 
both cases, the remaining particles are such that their 
maxima are above w — j2 and their minima are below 
— j2. When w — in ([TSl) . ^(ji,j2) can be replaced 
by its asymptotic behavior for small ji,j2- h{ji,j2) ^ 
^1 — -^(ji + ^2)^ Integrating then over j2 in ([TS|) yields 
the small w behavior in ([S]) with cat — 4:N{N — 1)/tt. 
Note that for very small w, it is much more likely to 
create a box of size smaller than w with two particles 
(which occurs with a probability oc w^) than with a single 
one [which occurs with probability oc exp (— 7r^/4?i'^)]. 
The former configurations thus dominate for small w. 

To get the large w behavior of p%{'w), we estimate 
/i(ji,j2) for large ji (fTSj). This is conveniently done 
by using the Poisson formula, which yields /i(ji, j2) ~ 
erfc (2ji -I- j2) -|-erfc (ji -I- 2^2). This estimate shows that 



for w :P ^/\ogN, the second term in (IT5|) becomes sub- 
dominant compared to the first one. Hence for very large 
w the leading contribution comes from the first term 



- J2,J2) ~ [erfc(w; -I- ^2) + 
'^\w) as one can show that 



where we replace h^^ 
erfc(2u; - j2)]^"^ by erfc^^" 
the integral over j2 in (llSp is dominated by the vicinity of 
j2 — [19]. This leads to the large w behavior in ([5]) with 
djv = SA^/tt^/^. The asymptotic behaviors of p%{'w) ^ 
have been verified numerically for = 3 in Fig. [3l 

To obtain the typical scale of Wn/V^I for large N, 
we use its relation to EVS ([2]). From standard EVS 
for i.i.d. random variables 25|, we know that A/_ = 
mini<i<ArMi, where Mi > and distributed according 
to a half-Gaussian, is of order 0{N~^). Its pdf is given 
by a WeibuU law, which is here an exponential distribu- 
tion [2^. Indeed one has here Pr.(iVM_ > x) — e 
X > 0, as N ^ 00. The same holds for m+, which for 
large TV becomes independent of M_ as both of them are 
reached by two independent walkers. Hence, from ([2l). 
NWn /V^ is given by the convolution of two exponen- 
tial laws: 



p%{w) ^ N\4/7r)e 



Nw 



dk^N C{Nw) , (16) 



with C{w) as announced in (O. We have also obtained 
this result jl^ by a direct large N expansion of ([TSl) . In 
Fig.Elwe plot p%{w)/N against w for TV = 10, 20 and 30 
and see that they both coincide with the function C{w)^ 
although the convergence is rather slow. 
Conclusion : We have achieved a complete analytic de- 
scription of the pdfs of the number of distinct and com- 
mon sites visited by N independent RWs after t time 
steps, for large t. We have also obtained interesting lim- 
iting distributions ([6l[7]) in the limit when N ^ 00. For 
distinct sites, we found an intriguing connection with the 
maximum of logarithmically correlated random variables 
on a circle (201] . 

One may wonder about the effects of interactions be- 
tween the walkers. For instance, one can study non- 
intersecting (vicious) RWs [27[. An interesting situation 
is the case where all N walkers start and end at the 
same point, while staying positive in the time interval 
[0,t] (watermelons with a wall). In this case, the num- 
ber of distinct sites Sn / corresponds to the maximal 
height of these watermelons ^2^. For large iV, the pdf of 
Sm/V^ ^/N properly shifted and scaled, converges to 
the Tracy- Widom distribution Ti [2^, which describes 
the fluctuations of the largest eigenvalue of Gaussian or- 
thogonal random matrices. On the other hand, the num- 
ber of common sites Wn/V^ is related to the maximum 
of the lower path, the distribution of which is a very in- 
teresting open problem [sH . 
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